It can be done using (GNU) grep, too: grep -o '^^|*' Edit: If you don't want trailing spaces but want to allow leading spaces resp. Spaces in the middle of the first field, you could change the command to: grep -o '^^|*^| .
1 You leave some trailing whitespace behind (which may or may not be a problem). But +1 for making it happen with just grep. – mu is too short Jun 17 at 18:52 @mu is too short: I added one character to my pattern (will work if the text field does not include spaces).
– bmk Jun 17 at 18:59 But can you make it work with only grep if there are spaces in the first field? – mu is too short Jun 17 at 19:05 @mu: I updated my answer. – bmk Jun 17 at 19:14 Nice, your grep-Fu is strong.
Shame that the question keeps changing though. – mu is too short Jun 17 at 20:03.
This looks like a job for sed: sed 's/\(.*\) |. *| \(.*\) |. *|/\1 \2/' filename or sed 's/ |^|*|//g' filename EDIT: The revised question is even easier: sed 's/ |.
*//' filename You might even be able to get away with sed 's/ . *//' filename but that's really pushing it.
Don't down vote this answer -- the question changed; probably was not formatted correctly initially. – Sai Jun 17 at 18:27 @Sai: (Thanks, very sporting of you! ) – Beta Jun 17 at 18:56.
Seemed like the question got edited -- or maybe I am losing it :) If all you need is the first part till the "|" something like the following should work sed 's/\(^|*\). */\1/' filename.txt.
Do you have a good tutorial on sed I was able to use the above to output to a new file, but now I want to run some commands on the file again to help with Update 1 and Update 2 above. – ABOB Jun 17 at 19:06 You can take a look at grymoire. Com/Unix/Sed.
Html -- if you can get your hands on the O'Reilly book on Sed and Awk that will be useful too. – Sai Jun 17 at 19:09 is there a way to sed for something like 23 | to be more specific a 2 digit number (exactly two digits) followed by a space followed by a | and then replace with nothing so those are gone? I am trying to think of ways to use the power I am discovering!
– ABOB Jun 17 at 22:48 You can try something like sed 's/0-90-9 |//' – Sai Jun 17 at 23:45.
With perl...for huge files... use Tie::File; tie @array, 'Tie::File', 'file. Path/file. Name' || die; for (@array) { s/^(^\|+).
*/$1/; } untie @array.
– ABOB Jun 17 at 19:08 = What about the updates I posted above where the file could be written a few different ways and I want to create a single perl script to parse each line, decide what format it is in and output the 'zzzzzzz' field. – ABOB Jun 17 at 19:36 @ABOB - Depends! But I wouldn't load 10M lines in memory unless for a good reason or a short time!
– cirne100 Jun 17 at 20:00 @ABOB - take a look at this link! With s/^(^\|+). */$1/; you will get "all until the first |".
– cirne100 Jun 17 at 20:05.
&& print "$1\n" }' input. Txt > output. Txt Should work flawlessly, unless the first entry may contain |.
What about the updates I posted above where the file could be written a few different ways and I want to create a single perl script to parse each line, decide what format it is in and output the 'zzzzzzz' field. Your one liner does work on my original test case. – ABOB Jun 17 at 19:34 @ABOB Well, in that case, you're going to have to give us some rules about the 'zzzzz' field.Is it only a-z letters?
What characters can be in there? – TLP Jun 17 at 19:42 zzzzz can be most any character as it represents a key/password so to speak. – ABOB Jun 17 at 19:48 see Update 3 for a comprehensive idea.
– ABOB Jun 17 at 19:51 @ABOB "any character" means it can be numbers or | or id@host. Tld, which means your regex can never be completely reliable.In other words, you're screwed. – TLP Jun 17 at 19:53.
It would be pretty simple in perl. You can do a split on " | " to get an array for each line. Then open a file to write, and write "$array0\n" Your program would look something like: open IN, '; close IN; open OUT, '>', "outfile.
Txt"; foreach(@lines){ chomp; @array = split /\s*\|\s*/, $_; print OUT $array0 . "\n"; } close OUT; For your updates: Split is a function that takes a pattern, an expression and returns an array of strings. So in the example above.
The pattern is a regular expression. \s is a space, \| is "|". So it's saying split on a space zero or more times (\s*), a pipe (\|) and zero or more spaces (\s*).
Update 1 would look like: @array = { 0 => "id@host. Com" 1 => "zzzzzzzzzz" } Update 2 would look like: @array = { 0 => "some Number" 1 => "zzzzzzzzzz" 2 => "id@host. Com" }.
Nate - I get a few errors for lines 9 and 10. Is it something to do with the '' in the split, the lines don't seem to tokenize in my editor but I don't see the offending item – ABOB Jun 17 at 18:26 1 The line with split should read: my $array = split /\s*\|\s*/; – pavel Jun 17 at 18:33 @pavel - Thanks, that results in Use of implicit split to @_ is deprecated at pw. Pl line 11.
And No comma allowed after filehandle at pw. Pl line 12. – ABOB Jun 17 at 18:36 1 this is a bad way to parse a text file with 10 million lines!
Tie::File module should be used in such cases! – cirne100 Jun 17 at 18:38 @ABOB Oops! I fixed the answer... as for the error, I changed $array = split.., to @array = split... The way I had it, I was clobbering @_.
I think sed is the way to go though. – Nate Jun 17 at 18:54.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.